216        Bioinformatics

of the ChIP-Seq signal may vary depending on the binding protein studied. The ChIP-Seq

signal can be sharp, broad, or a mix of sharp and broad signal. The sharp signal character-

izes the binding site of the TF which binds to a specific site in the DNA sequence called

motif. Histones form broad ChIP-Seq signals because they span several nucleosomes and

may cover several nucleotides on the DNA. The RNA polymerase II (Pol II) initiates the

process of transcription by localizing on the promoter region of the gene and then it moves

during the messenger RNA transcription. Therefore, the ChIP-Seq signal of Poly II may

include both sharp and broad signals (Figure 6.1).

Peak-calling programs use sliding windows to scan the genome for these patterns to

locate the binding regions by counting both Watson and Crick tags. However, for these

kinds of tags to fit in a single window, they must be shifted to the center so that Watson tags

are shifted toward the 3 end and Crick tags are shifted toward the 5 end to form a peak in

the putative binding site. Peak-calling programs like MACS take advantage of the bimodal

pattern to empirically model the shifting size to precisely locate the binding sites [2] on the

genomic DNA sequences.

Peak calling is a step unique to ChIP-Seq data analysis and it aims to identify the

genomic regions occupied by the protein of interest and enriched due to the ChIP. The

abundance of the aligned reads normalized by input reads in a sliding window is the basis

of the peak calling, which is performed using statistics that determine peak significance.

The ChIP-Seq tags are usually normalized by input read (control), but some peak callers

can also call peaks without using input reads. Instead, they assume even background signal

D

FIGURE 6.2  ChIP-Seq read alignment. (The peaks represent reads aligned to the reference

genome.)

FIGURE 6.1  Sharp signal (TF), broad signal (histones), and mixed signal (Poly II).